Back

Genome Biology

Springer Science and Business Media LLC

All preprints, ranked by how well they match Genome Biology's content profile, based on 14 papers previously published here. The average preprint has a 0.05% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.

1
A genome-wide association analysis of 2,622,830 individuals reveals new pathogenic pathways in gout.

Major, T. J.; Takei, R.; Matsuo, H.; Leask, M. P.; Topless, R. K.; Shirai, Y.; Li, Z.; Ji, A.; Cadzow, M. J.; Sumpter, N. A.; Merriman, M. E.; Phipps-Green, A. J.; Urquiaga, M.; Kelley, E. E.; Lewis, S. E.; Maxwell, B.; Wei, W.-H.; King, R. D.; McCormick, S. P.; Reynolds, R. J.; Saag, K. G.; Bixley, M. J.; Fadason, T.; O'Sullivan, J. M.; Stamp, L. K.; Dalbeth, N.; Abhishek, A.; Doherty, M.; Roddy, E.; Jacobsson, L. J.; Kapetonovic, M. C.; Melander, O.; Andres, M.; Perez Ruiz, F.; Torres, R.; Radstake, T.; Jansen, T. L.; Janssen, M.; Joosten, L. A.; Liu, R.; Gaal, O.; Crisan, T. O.; Rednic, S.;

2022-11-29 rheumatology 10.1101/2022.11.26.22281768
#1
238× avg
Show abstract

Gout is a chronic disease of monosodium urate crystal deposition in the setting of hyperuricemia that typically presents with recurrent flares of acute inflammatory arthritis that occur due to innate immune response to deposited crystals. The molecular mechanism of the progression from hyperuricemia to clinical gout is poorly understood. Here we provide insights into this progression from a genetic study of 2.6 million people, including 120,282 people with gout. We detected 376 loci and 410 genetically independent signals (148 new loci in urate and gout). We identified 1,768 candidate genes with subsequent pathway analysis revealing urate metabolism, type 2 diabetes, and chromatin modification and structure as top pathways in gout. Genes located within or statistically linked to significant GWAS loci were prioitized for their potential to control the progression from hyperuricemia to gout. This identified strong candidate immune genes involved in epigenetic remodelling, cell osmolarity, and regulation of NLRP3-inflammasome activity. The genetic association signal at XDH, encoding the urate-producing enzyme xanthine oxidoreductase (XOR), co-localizes with genetic control of XDH expression, but only in the prostate. We demonstrate XOR activity and urate production in the mouse prostate, and use single-cell RNA sequence data to propose a model of urate reuptake, synthesis, and secretion by the prostate. The gout-associated loci were over-represented for genes implicated in clonal hematopoeiesis of indeterminate potential (CHIP) and Mendelian randomization analysis provided evidence for a causal role of CHIP in gout. In concert with implication of epigenomic regulators, this provides support for epigenomic remodelling as causal in gout. We provide new insights into the molecular pathogenesis of gout and identify an array of candidate genes for a role in the inflammatory process of gout.

2
Leveraging DNA methylation to create Epigenetic Biomarker Proxies that inform clinical care: A new framework for Precision Medicine

Carreras Gallo, N.; Chen, Q.; Balague Dobon, L.; Aparicio, A.; Giosan, I. M.; Dargham, R.; Phelps, D.; Guo, T.; Mendez, K. M.; Chen, Y.; Carangan, A.; Vempaty, S.; Hassouneh, S.; McGeachie, M.; Mendez, T.; Comite, F.; Suhre, K.; Smith, R.; Dwaraka, V. B.; Lasky Su, J. A.

2024-12-08 health informatics 10.1101/2024.12.06.24318612
#1
127× avg
Show abstract

The lack of accurate, cost-effective, and clinically relevant biomarkers remains a major barrier to incorporating omic data into clinical practice. Previous studies have shown that DNA methylation algorithms have utility as surrogate measures for selected proteins and metabolites. We expand upon this work by creating DNAm surrogates, termed epigenetic biomarker proxies (EBPs), across clinical laboratories, the metabolome, and the proteome. After screening >2,500 biomarkers, we trained and tested 1,694 EBP models and assessed their incident relationship with 12 chronic diseases and mortality, followed up to 15 years. We observe broad clinical relevance: 1) there are 1,292 and 4,863 FDR significant incident and prevalent associations, respectively; 2) most of these associations are replicated when looking at the lab-based counterpart, and > 62% of the shared associations have higher odds and hazard ratios to disease outcomes than their respective observed measurements; 3) EBPs of current clinical biochemistries detect deviations from normal with high sensitivity and specificity. Longitudinal EBPs also demonstrate significant changes corresponding to the changes observed in lab-based counterparts. Using two cohorts and > 30,000 individuals, we found that EBPs validate across healthy and sick populations. While further study is needed, these findings highlight the potential of implementing EBPs in a simple, low-cost, high-yield framework that benefits clinical medicine.

3
NHAPL analysis of glycoRNA reveals sialic acid-containing glycosylated mRNA 3UTRs and enables sensitive SLE diagnostics

Gui, J.; Zhang, M.; Kan, Z.; He, X.; Gao, M.; Han, J.; Wang, Q.; Zhang, S.; Hu, J.; Qin, W.; Bi, Z.; Huang, B.; Wu, Z.; Ran, J.

2026-02-09 rheumatology 10.64898/2026.02.05.26345357
#1
109× avg
Show abstract

GlycoRNA, newly identified RNA molecules bearing glycan modifications on cell membranes, is implicated in cell communication and immune regulation. However, current technological limitations impede a thorough elucidation of their biological roles and clinical significance. Here, we developed Nucleotides Hybridization and Aptamer-based Proximity Ligation (NHAPL), a homogeneous assay enabling sensitive and quantitative glycoRNA analysis from 160pg total cell RNA and 1{micro}l serum. NHAPL integrates dual recognition by a sialic acid aptamer and RNA binding probe, followed by ligation and qPCR amplification. We further established multiplexed NHAPL for simultaneous detection of multiple glycoRNA. Using NHAPL, we uncover for the first time that protein-coding mRNAs, specifically 3' untranslated region (3'UTR) fragments of FNDC3B and CTSS, undergo sialic acid-containing N-glycosylation on the cell surface. These glycoRNAs functionally promote monocyte adhesion to endothelial cells and hepatoma cell migration, revealing a direct role in cell-cell interactions and cancer-related phenotypes. Applying multiplexed NHAPL to human serum, we identify glycoRNA signatures highly specific to systemic lupus erythematosus (SLE). In particular, glycoY5 and glycoU1 achieve near-complete discrimination between patients and healthy controls (area under the curve (AUC) = 1.00 and 0.9977), whereas conventional total RNA analysis fails to capture these differences, highlighting RNA glycosylation modification as a distinct regulatory layer. Its simplicity and flexibility make it well suited for clinical glycoRNA profiling and biomarker discovery. Overall, NHAPL represents a robust and versatile platform for advancing glycoRNA research and diagnostic development.

4
Distinguishing causal from tagging enhancers using single-cell multiome data

Dorans, E.; Price, A. L.

2026-02-17 genetic and genomic medicine 10.64898/2026.02.15.26346353
#1
107× avg
Show abstract

Methods that analyze single-cell RNA-seq+ATAC-seq multiome data have shown promise in linking enhancers to target genes by correlating chromatin accessibility with gene expression across cells. However, correlations among ATAC-seq peaks may induce non-causal tagging peak-gene links (analogous to tagging associations in GWAS); indeed, we confirm that tagging effects induced by peak co-accessibility are pervasive in peak-gene linking. We defined two scores for each ATAC-seq peak: co-accessibility score, the sum of squared correlations with each nearby peak; and co-activity score, the sum of squared correlations with each nearby gene. We compared these scores in 4 multiome data sets (spanning 86k cells and 6 immune/blood cell types) and determined that co-accessibility score and co-activity score were strongly correlated across peaks (r = 0.57-0.73); these correlations were not explained by read depth, cell subtypes, or measurement noise, but are consistent with tagging. Indeed, non-causal peak-gene correlations were strongly correlated to a peaks tagging correlation with a causal peak in CRISPRi data (r = 0.92). We further determined that causal peak-gene associations are concentrated in specific functional categories of peaks, by regressing co-activity scores on stratified co-accessibility scores (S-CASC): e.g. 2.91x (s.e. 0.67) enrichment for peaks closest to a genes TSS and 1.41x (s.e. 0.11) enrichment for peaks overlapping H3K27ac marks. Co-accessibility scores were substantially driven by the number of transcription factor binding sites (TFBS) within a peak, and peak-peak correlations were substantially driven by the number of TFBS pairs within the two peaks with a shared TF. These effects were concentrated in a small number of pioneer TFs, which activate repressed chromatin regions. Consistent with widespread tagging, peak-gene links that we fine-mapped using SuSiE significantly outperformed marginal peak-gene links in evaluation sets derived from CRISPRi and eQTL data. We provide examples demonstrating the impact of tagging effects at specific peaks and genes implicated in GWAS of blood cell traits. Our findings underscore the importance of accounting for tagging effects when linking enhancers to target genes.

5
Synovial Transcriptome Profiling for Predicting Biological Treatment Response in Rheumatoid Arthritis: A Feasibility study

D'Ailly, P.; Schaffers, O.; Deugd, C.; Versnel, M.; werken, H.; Bindels, E.; Tas, S.; Gribnau, J.; Schep, N.; Bisoendial, R.

2024-08-29 rheumatology 10.1101/2024.08.28.24312608
#1
107× avg
Show abstract

IntroductionDisease-Modifying Anti-Rheumatic Drug (DMARD) treatment fails to achieve clinical remission in a substantial proportion of patients with rheumatoid arthritis (RA). Patient-derived synovial tissue (ST)-signatures, thought to determine this heterogeneity of treatment responses, can be studied by single-cell RNA sequencing (scRNA-seq). Study aimsThe first aim was to obtain viable ST from RA patients using wrist arthroscopy. The second aim was to identify patient-specific transcriptome signatures from the ST omics data that relate to clinical course and treatment responses in RA. MethodsRadiocarpal and midcarpal synovectomy was performed using a standard set-up wrist arthroscopy. Single-cell suspensions of ST from affected wrists of two RA patients and a control subject were processed on the 10X Genomics Chromium Platform. Seurat was used for downstream analysis. ResultsIn two RA patients and one non-inflammatory control, ST was successfully removed during wrist arthroscopy. No surgical complications occurred. For the RA patients and control, 17,176 and 7,884 high-quality cells were analyzed, respectively. Apart from enrichment of cell compartments in RA, including those of B- and plasma cells, T cell populations, NK cells, and macrophages, we observed interpatient variability that may influence the relationship between RA synovial signature and clinical phenotype, potentially also affecting treatment response and outcome. In-depth analysis of the prevailing cell-type abundance phenotype (CTAP) in the RA patients, as described previously, provided insights into the extent to which these CTAPs may be used to predict treatment responses. ConclusionIn this feasibility study, we demonstrated that wrist arthroscopy successfully retrieves ST with good tissue viability, which may provide informative and high-quality transcriptomic data for predicting therapy response at an individual level.

6
Accurate, sensitive, and efficient chromatin accessibility quantification at target loci using UNIChro-seq.

Kono, M.; Hatano, H.; Asahara, K.; Nakano, M.; Bagherzadeh, R.; Kawashima, T.; Arakawa, T.; Sato, M.; Inokuchi, H.; Nishino, T.; Itamiya, T.; Takahashi, H.; Natsumoto, B.; Suzuki, A.; Yamamoto, K.; Ishigaki, K.

2025-07-29 genetic and genomic medicine 10.1101/2025.07.29.25332340
#1
101× avg
Show abstract

Recent progress in statistical and experimental fine mapping of disease risk variants prompts us to focus on specific target loci for functional investigation. However, current genetics is hindered by a limited toolbox for target-loci analysis. To address this, we developed UNIChro-seq, a method that digitally counts accessible chromatin molecules at target loci. UNIChro-seq allows for accurate, sensitive, and efficient quantification of allelic effects compared to conventional methods. Using UNIChro-seq, we investigated the effects of 57 autoimmunity risk alleles on chromatin accessibility and estimated the causal effects of 20 artificial variants generated through genome editing. As a caveat, non-negligible fraction of the edited allele exhibited a falsely positive effect on chromatin accessibility, which can be effectively distinguished from the true causal effect through bi-directional genome editing. Finally, functional dissection of a fine-mapped risk variant at the LEF1 locus illuminated its impact on T cell pathology in rheumatoid arthritis. Together, these findings underscore the utility of combining UNIChro-seq with genome editing technology to enable precise and scalable functional analysis of disease-associated loci.

7
Global DNA methylomes reveal oncogenic-associated 5-hydroxylmethylated cytosine (5hmC) signatures in the cell-free DNA of cancer patients

Rech, G. E.; Lau, A. C.; Goldfeder, R. L.; Maurya, R.; Danilov, A. V.; Wei, C.-L.

2025-01-15 genetic and genomic medicine 10.1101/2025.01.09.25320283
#1
100× avg
Show abstract

Characterization of tumor epigenetic aberrations is integral to understanding the mechanisms of tumorigenesis and provide diagnostic, prognostic, and predictive information of high clinical relevance. Among the different tumor-associated epigenetic signatures, 5 methyl-cytosine (5mC) and 5-hydroxymethylcytosine (5hmC) are the two most well-characterized DNA methylation alterations linked to cancer pathogenesis. 5hmC has a tissue-specific distribution and its abundance is subjected to changes in tumor DNA, making it a promising biomarker. Detecting tumor-related DNA methylation alterations in tissues is highly invasive, while the analysis of the cell-free DNA (cfDNA) is poised to supplement, if not replace, surgical biopsies. Despite many studies attempted to identify new epigenetic targets for liquid biopsy assays, little is known about the regulatory roles of 5hmC, its impacts on the molecular phenotypes in tumors. Most importantly, whether the oncogenic-associated 5hmC signatures found in tumor tissues can be recapitulated in patients cfDNA. In this study, we performed the unbiased and simultaneous detection of 5mC and 5hmC whole-genome DNA modifications at base-resolution from two distinct cancer cohorts, from patients with bladder cancer or B-Cell lymphoma, their corresponding normal tissues, and cfDNAs from plasma. We analyzed tissue-specific methylation patters and searched for signatures in gene coding and regulatory regions linked to cancerous states. We then looked for methylation signatures in patients cfDNA to determine if they were consistent with the tumor-specific patterns. We determined the functional significance of 5hmC in tissue specific transcription and uncovered hundreds of tumor-associated 5hmC signatures. These tumor-associated 5hmC changes, particularly in genes and enhancers, were functionally significant in tumorigenesis pathways and correlated with tumor specific gene expression. To investigate if cfDNA is a faithful surrogate for tumor-associated 5hmC, we devised a targeted capture strategy to examine the alterations of 5hmC in cfDNA from patients with bladder cancer and lymphoma with sufficient sensitivity and specificity and confirmed that they recapitulated the patterns we observed in tumor tissues. Our results provide analytic validation of 5hmC as a cancer-specific biomarker. The methods described here for systematic characterization of 5hmC at functional elements open new avenues to discover epigenetic markers for non-invasive diagnosis, monitoring, and stratifying cancer.

8
Airborne particulate matter enhances with monosodium urate crystals the secretion of IL-1b by human immune cells

Razazan, A.; Merriman, M.; Burden, N.; Reynolds, R.; Joosten, L. A.; Hussain, S.; Merriman, T.

2026-03-02 rheumatology 10.64898/2026.02.26.26347218
#1
99× avg
Show abstract

Gout is driven by an interleukin-1{beta}-mediated intense innate immune reaction to monosodium urate (MSU) crystals (MSUc). In cell culture models of inflammatory gout there is a synergistic effect of phagocytosis of MSUc and TLR2 and TLR4 activation by agonists such as free fatty acid and lipopolysaccharide (LPS) in NLRP3-inflammasome activation and IL-1{beta} secretion. A substantial number of gout patients do not report a dietary trigger, and observational studies associate airborne particulate matter with incident gout and flares. Airborne particulate matter contains LPS and airborne-derived particulate matter stimulates IL-1{beta} secretion in cell culture. We hypothesized that air-borne particulate matter could co-stimulate, with MSUc, IL-1{beta} secretion and inflammation. We tested the hypothesis using MSUc with extracted airborne PM4 in human cells (the THP-1 monocyte cell line, primary human monocytes and PBMCs) or carbon black particles with ozone (CB+O3) in a murine foot-pad injection model of gout. There was strong NLRP3-inflammasome-dependent co-stimulation of IL-1{beta} secretion in THP-1 cells with PM4+MSUc and a moderate additive effect in primary human PBMCs. However, there was no added effect on IL-1{beta} secretion of PM4 in isolated primary human monocytes. Inhalation of CB+O3 persistently exacerbated MSUc-induced murine paw inflammation, with an increase of alveolar/lavage macrophages that contained CB+O3 particles and increased lavage expression of IL-1{beta}. In conclusion, airborne-derived PM4 particulate matter enhanced MSUc-induced IL-1{beta} secretion in THP-1 cells and PBMCs. Combined with exacerbation of MSUc-induced inflammation by fine particulate matter in in vivo experiments, these data provide evidence that exposure to fine particulate matter may play a role in the etiology of gout.

9
Inflammation induced epigenetic activation of bivalent genes in osteoarthritic cartilage

Du, H.; Zhou, z.; Wang, Y.; Yu, X.; Huang, S.; Zhang, Y.; Yang, F.; Ding, B.-S.; You, X.; Wu, D.; Luo, Z.; Cai, Y.; Lu, H.; Liao, Z.; Zhao, Y.; Gan, F.; Ning, N.; Zeng, J.; Xiao, K.

2023-04-20 rheumatology 10.1101/2023.04.13.23288509
#1
98× avg
Show abstract

Osteoarthritis (OA) is the most prevalent joint disorder occurring with articular cartilage degradation, which includes a switch from an articular to a growth-plate chondrocyte phenotype. Epigenetics serves as a new therapeutic target but histone modification changes in OA remain elusive. Here, we investigated the profiles of four histone modifications in normal and OA chondrocytes. The repressive mark H3K27me3 was significantly lost in OA, associated with up-regulated gene expression. Surprisingly, many of these genes were occupied by both H3K27me3 and H3K4me3 in normal chondrocytes, showing a poised bivalent state. These bivalent genes are deemed to be activated during the hypertrophy of growth plate chondrocytes. Furthermore, inflammation induced the expression of demethylase KDM6B and decreased H3K27me3 level in OA chondrocytes, which was rescued by the KDM6B inhibitor GSK-J4. Altogether, our results suggest an inherited bivalent epigenetic signature on developmental genes that makes articular chondrocytes prone to hypertrophy and contribute to a promising epigenetic therapy for OA. The Paper ExplainedO_ST_ABSProblemC_ST_ABSOsteoarthritis (OA) affects as much as 40% of the elderly population, representing the largest cause of age-related disability. The high susceptibility to OA suggests an intrinsic and systemic characteristic in articular chondrocytes that makes cartilage prone to degeneration. ResultsEpigenetic bivalent genes, which are occupied with both H3K27me3 and H3K4me3, are considered to poise expression of developmental genes. Surprisingly, we reported bivalency for hypertrophy related genes in normal articular chondrocytes. These bivalent genes need to be activated in growth plate chondrocytes for extracellular matrix degradation and ossification, but are left as a "bomb" for degeneration in articular chondrocytes. We further found that inflammation induced KDM6B remove H3K27me3 to activate hypertrophy related genes that promote OA. ImpactOur results suggest an inherited epigenetic signature that makes articular chondrocytes prone to hypertrophy and ossification and contribute to a promising epigenetic therapy for OA.

10
exRNA Signatures in Extracellular Vesicles and their Tumor-Lineage from Prostate Cancer

Dogra, N.; Ahsen, M. E.; Kozlova, E. G.; Chen, T.-y.; allette, k.; Olsen, R.; Han, D.; Kim, S.; Gifford, S. M.; Smith, J. T.; Wunsch, B. H.; Weil, R.; Bhatt, K.; Yadav, k. K.; vlachos, k.; Nair, S.; Gordon, R. E.; Smith, M.; Sebra, R.; Margolin, A.; Sahoo, S.; Tewari, A.; Cordon-Cardo, C.; Losic, B.; Stolovitzky, G. A.

2020-09-30 genetic and genomic medicine 10.1101/2020.09.28.20190009
#1
95× avg
Show abstract

Circulating extracellular vesicles (EVs) present in the bodily fluids of patients with cancer may provide non-invasive access to the tumor tissue. Yet, the transcriptomic lineage of tumor-derived EVs before and after tumor-resection remains poorly understood. Here, we established 60 total small RNA-sequencing profiles from 17 aggressive prostate cancer (PCa) patients tumor and adjacent normal tissue, and EVs isolated from urine, serum, and cancer cell culture media. We interrogated the key satellite alteration in tumor-derived EVs and found that resection of tumor prostate tissue leads to differential expression of reactive oxygen species (ROS), P53 pathways, inflammatory/cytokines, oncogenes, and tumor suppressor genes in the EV nanosatellites. Furthermore, we provide a set of novel EV-specific RNA signature, which are present in cancer but are nonexistent in post-resection patients with undetectable cancer. Finally, using a de novo RNAseq assembly followed by characterization of the small RNA landscape, we found novel small RNA clusters (smRCs) in the EVs, which reside in the unannotated regions. Novel smRCs were orthogonally validated for their differential expression in the biomarker discovery cohort using RT-qPCR. We demonstrate that circulating tumor EVs provide a glimpse of the tumor tissue biology, resolving a major bottleneck in the current liquid biopsy efforts. Secretory vesicles appear to be playing a key role in non-canonical Wnt signaling and miRNA pathways, similar to the circulating tumor cells (CTCs), hence, we propose that such vesicles be called circulating tumor extracellular vesicles (CTEVs).

11
Flexible and efficient count-distribution and mixed-model methods for eQTL mapping with quasar

Pullin, J. M.; Wallace, C.

2025-07-17 genetic and genomic medicine 10.1101/2025.07.17.25331702
#1
93× avg
Show abstract

Identifying genetic variants that affect gene expression, expression quantitative trait loci (eQTLs), is a major focus of modern genomics. Today, various methods exist for eQTL mapping, each using different statistical and methodological approaches. However, it is unclear which approaches lead to better performance, and challenges, particularly scalability as datasets continue to increase in size, remain. Here, we introduce quasar, a flexible and efficient C++ software program for eQTL mapping. Compared to existing eQTL mapping methods, quasar implements a wider variety of statistical models, including the linear model, Poisson and negative binomial generalised linear models, linear mixed model and Poisson and negative binomial generalised linear mixed models. Methodologically, we introduce and implement a simple, analytic approximation to the score test variance in mixed models. Furthermore, we highlight that difficulties with accurately estimating the negative binomial dispersion parameter, previously identified in the context of RNA-seq differential expression analysis, also apply to eQTL mapping. Therefore, quasar implements the Cox-Reid adjusted profile likelihood which enables unbiased estimation of the negative binomial dispersion parameter. We assess quasars performance and compare it to three existing eQTL mapping methods: apex, jaxQTL and tensorQTL, on the OneK1K dataset. We demonstrate that quasars output agrees with established methods where their models aligns but that quasar is at least 30% and up to 25 times faster. We exploit the range of models implemented in quasar to compare statistical models for eQTL mapping without confounding by implementation. We find that: count-based models have higher power, mixed models do not show better performance in a dataset without substantial relatedness, and the adjusted profile likelihood improves Type 1 error control when using the negative binomial distribution. Additionally, we investigate the relative performance of Poisson and negative binomial mixed models and the use of different approaches for gene-level FDR control. Overall, quasar provides a performant and versatile program for eQTL mapping and we nominate the negative binomial GLM model, incorporating adjusted profile likelihood dispersion estimation, as the statistical model with the best performance.

12
Genetically informed search for potential osteoarthritis drug targets across the proteome

Liu, W.; Zuckerman, B. P.; Schuermans, A.; Orozco, G.; Honigberg, M. C.; Bowes, J.; ONeill, T. W.; Zhao, S. S.

2026-02-11 rheumatology 10.64898/2026.02.10.26345885
#1
92× avg
Show abstract

BackgroundOsteoarthritis (OA) is a leading cause of disability worldwide, yet no licensed therapies can prevent or slow its progression. We aimed to identify potential targets for disease-modifying OA drugs (DMOADs) by integrating genetic and differential protein expression (DPE) evidence. MethodsWe evaluated genetically predicted perturbations of plasma protein levels using cis-protein quantitative trait loci (cis-pQTLs) across three large European cohorts (UK Biobank Pharma Proteomics Project, deCODE, and Fenland) and outcome data from the Genetics of Osteoarthritis Consortium, covering 11 OA phenotypes. DPE analyses were performed in 44,789 UKB participants, comparing 2,920 protein measurements between OA cases and controls, supported by sensitivity analyses. Proteins identified through genetic and/or DPE approaches were further assessed in downstream analyses. FindingsIn total, 305 proteins showed evidence of association with OA through genetically predicted perturbations, with 81 supported by colocalisation across datasets. DPE analyses identified 605 proteins associated with at least one OA phenotype, of which 450 (74{middle dot}4%) remained robust after sensitivity testing. Several novel targets were identified, including PPP1R9B, PCSK7, and ITIH4. Integration of both approaches prioritised 5 proteins, 4 of which demonstrated druggable potential, including 3 high-confidence candidates DLK1, TNFRSF9, and OGN. Downstream analyses highlighted key biological pathways and candidate compounds with potential for repurposing. InterpretationThis large-scale study combines genetic and DPE evidence to prioritise candidate DMOAD targets. Findings reinforce established biology while revealing novel proteins and pathways, providing a foundation for therapeutic development in OA. FundingWL is supported by the Guangzhou Elite Project (project no. JY202314). SSZ is supported by The University of Manchester Deans Prize, Arthritis UK Career Development Fellowship (grant no. 23258). This work is supported by the NIHR Manchester Biomedical Research Centre (NIHR203308). Research in contextO_ST_ABSEvidence before this studyC_ST_ABSCirculating proteins have been linked to osteoarthritis (OA) in observational studies, supporting their potential as biomarkers and drug targets. However, differential protein expression analyses are vulnerable to confounding and reverse causation. Mendelian randomisation (MR) studies using proteomic GWAS instruments have suggested causal roles for several circulating proteins in OA-related traits and highlighted druggable candidates. However, many analyses relied on earlier OA GWAS data (e.g., Genetics of Osteoarthritis Consortium 1{middle dot}0) and smaller proteomic GWAS datasets, and typically did not integrate MR findings with large-scale differential protein expression. As a result, it remains unclear how well genetically predicted protein effects align with observed protein expression in OA, and how robust prioritised targets are when replicated across proteomic data from multiple cohorts. Added value of this studyThis study integrates large-scale proteomic MR and differential protein expression (DPE) analyses across multiple OA phenotypes using the largest datasets to date. By combining genetic evidence with observed protein dysregulation in population-based cohorts, we strengthen causal inference and improve robustness of target prioritisation. This approach allows us to distinguish proteins that are likely to play a causal role in OA from those that reflect downstream disease processes, and to highlight targets with greater translational relevance than identified by either method alone. Implications of all the available evidenceTaken together, our findings support a causal role for a subset of circulating proteins in OA and demonstrates the value of integrating genetic and observational proteomic data for target prioritisation. Proteins supported by both MR and DPE are more likely to represent biologically relevant drivers of disease and actionable therapeutic targets. This integrated framework reduces false positives arising from confounding or reverse causation and provides a more reliable basis for drug development, biomarker discovery, and patient stratification in OA.

13
New Genetic Insights in Rheumatoid Arthritis using Taxonomy3(R), a Novel method for Analysing Human Genetic Data

Kozlowska, J.; Humphryes-Kirilov, N.; Pavlovets, A.; Connolly, M.; Kuncheva, Z.; Horner, J.; Sousa Manso, A.; Murray, C.; Fox, J. C.; McCarthy, A.

2023-02-24 rheumatology 10.1101/2023.02.21.23286176
#1
92× avg
Show abstract

Genetic support for a drug target has been shown to increase the probability of success in drug development, with the potential to reduce attrition in the pharmaceutical industry alongside discovering novel therapeutic targets. It is therefore important to maximise the detection of genetic associations that affect disease susceptibility. Conventional statistical methods used to analyse genome-wide association studies (GWAS) only identify some of the genetic contribution to disease, so novel analytical approaches are required to extract additional insights. C4X Discovery has developed a new method Taxonomy3(R) for analysing genetic datasets based on novel mathematics. When applied to a previously published rheumatoid arthritis GWAS dataset, Taxonomy3(R) identified many additional novel genetic signals associated with this autoimmune disease. Follow-up studies using tool compounds support the utility of the method in identifying novel biology and tractable drug targets with genetic support for further investigation.

14
Representation learning for multi-modal spatially resolved transcriptomics data

Nonchev, K.; Andani, S.; Ficek-Pascual, J.; Nowak, M.; Sobottka, B.; Tumor Profiler Consortium, ; Koelzer, V. H.; Raetsch, G.

2024-06-04 health informatics 10.1101/2024.06.04.24308256
#1
91× avg
Show abstract

Spatial transcriptomics enables in-depth molecular characterization of samples on a morphology and RNA level while preserving spatial location. Integrating the resulting multi-modal data is an unsolved problem, and developing new solutions in precision medicine depends on improved methodologies. Here, we introduce AESTETIK, a convolutional deep learning model that jointly integrates spatial, transcriptomics, and morphology information to learn accurate spot representations. AESTETIK yielded substantially improved cluster assignments on widely adopted technology platforms (e.g., 10x Genomics, NanoString) across multiple datasets. We achieved performance enhancement on structured tissues (e.g., brain) with a 21% increase in median ARI over previous state-of-the-art methods. Notably, AESTETIK also demonstrated superior performance on cancer tissues with heterogeneous cell populations, showing a two-fold increase in breast cancer, 79% in melanoma, and 21% in liver cancer. We expect that these advances will enable a multi-modal understanding of key biological processes.

15
Benchmarking methods integrating GWAS and single-cell transcriptomic data for mapping trait-cell type associations

Zeng, J.; Wray, N. R.; Li, A.; Lin, T.; Walker, A.; Tan, X.; Zhao, R.; Yao, S.; Hjerling-Leffler, J.; Sullivan, P. F.

2025-05-25 genetic and genomic medicine 10.1101/2025.05.24.25328275
#1
91× avg
Show abstract

Genome-wide association studies (GWAS) have discovered numerous trait-associated variants, but their biological context remains unclear. Integrating GWAS summary statistics with single-cell RNA-sequencing expression profiles can help identify the cell types in which these variants influence traits. Two main strategies have been developed to integrate these data types. The "single cell to GWAS" strategy (representing most methods) identifies gene sets with cell-type-specific expression and then follows with enrichment analyses applied to GWAS summary statistics. Conversely, the "GWAS to single cell" strategy begins with a list of trait-associated genes and calculates a cumulative disease score per cell based on gene expression count data. We systematically evaluated 19 approaches verses "ground truth" trait-cell type pairs to assess their statistical power and false positive rates. Based on these analyses, we draw seven key conclusions to guide future studies. We also propose a Cauchy approach to combine the two main strategies to maximize power for detecting trait-cell type associations.

16
Complexity of the neutrophil transcriptome in early and severe rheumatoid arthritis. A role for microRNAs?

Fresneda Alarcon, M.; Abdullah, G. A.; Beggs, J. A.; Kynoch, I.; Sellin, A.; Cross, A.; Heldenby, S.; Antczak, P.; Caamano Gutierrez, E.; Wright, H. L.

2024-12-13 rheumatology 10.1101/2024.12.12.24318900
#1
90× avg
Show abstract

Neutrophils are innate immune cells that drive the progression of rheumatoid arthritis (RA) through the release of reactive oxygen species (ROS), neutrophil extracellular traps (NETs) and proteases that damage host tissues. Neutrophil activation is regulated, in part, by dynamic changes in gene expression. In this study we have used RNAseq to measure the transcriptomes of neutrophils from people with severe, methotrexate-refractory RA and healthy controls. We identified a dynamic gene expression profile in people with severe RA. This is dominated by a type-I interferon-induced gene expression signature as well as activation of genes regulating neutrophil degranulation, NET production, response to ROS and oxidative stress. Whilst we did not detect significantly elevated levels of interferon-alpha in RA blood sera, we identified increased expression in RA neutrophils of miR-96- 5p and miR-183-5p which regulate activation of the interferon pathway as members of the miR-183C cluster. We also detected significantly elevated NET debris in RA blood sera (p<0.05). Using gene set variation analysis we explored the heterogeneity of neutrophil gene expression in RA and identified subsets of patients with gene expression profiles reflecting enhanced neutrophil degranulation and cytotoxicity, tissue inflammation or activation by interferons. Comparison with published single-cell RNAseq datasets identified RA transcriptomes where neutrophils were polarised by genes relating to early or late cell maturity, with significant genes in each polarised state being regulated by miR-146a- 5p, miR-155-5p, miR-183-5p or miR-96-5p. Overall our study demonstrates the heterogeneity of the RA neutrophil transcriptome and proposes miRNA-driven mechanisms for regulating the activated neutrophil phenotype in RA.

17
Probing epigenetic clocks as a rational markers of biological age using blood cell counts

Jonkman, T. H.; van Zwet, E. W.; Heijmans, B. T.

2025-05-13 genetic and genomic medicine 10.1101/2025.05.12.25327213
#1
89× avg
Show abstract

BackgroundEpigenetic clocks are widely applied biomarkers of biological age, but their biological underpinnings remain unclear. We previously showed that epigenetic clocks are affected by naive and memory T cell proportions, suggesting blood cell composition as a potential driver. MethodsHere, we quantify the contribution of cell counts to DNA methylation age (DNAmAge) and age acceleration (AgeAccel) estimated by six 1st- or 2nd-generation epigenetic clocks. First, we present a principal component analysis (PCA) method that is robust to collinearity of blood cell counts and show this provides biologically meaningful insights. ResultsApplying this approach, we find strong associations between DNAmAge and cell counts, particularly with naive and memory T cells. In contrast, associations between AgeAccel and cell counts are weaker and, particularly for 2nd-generation clocks, primarily involve neutrophils. We validate these findings in an external dataset of artificial cell mixtures. ConclusionsWe conclude that DNAmAge and AgeAccel reflect different biological processes.

18
Interrogation of the Perturbed Gut Microbiota in Gouty Arthritis Patients Through in silico Metabolic Modeling

Henson, M. A.

2020-09-03 genetic and genomic medicine 10.1101/2020.09.02.20187013
#1
86× avg
Show abstract

Recent studies have shown perturbed gut microbiota associated with gouty arthritis, a metabolic disease in which an imbalance between uric acid production and excretion leads to the deposition of uric acid crystals in joints. To mechanistically investigate altered microbiota metabolism in gout disease, 16S rRNA gene amplicon sequence data from stool samples of gout patients and healthy controls were computationally analyzed through bacterial community metabolic modeling. Patient-specific models were used to cluster samples according to their metabolic capabilities and to generate statistically significant partitioning of the samples into a Bacteroides-dominated, high gout cluster and a Faecalibacterium-elevated, low gout cluster. The high gout cluster samples were predicted to allow elevated synthesis of the amino acids D-alanine and L-alanine and byproducts of branched-chain amino acid catabolism, while the low gout cluster samples allowed higher production of butyrate, the sulfur-containing amino acids L-cysteine and L-methionine and the L-cysteine catabolic product H2S. The models predicted an important role for metabolite crossfeeding, including the exchange of acetate, D-lactate and succinate from Bacteroides to Faecalibacterium to allow higher butyrate production differences than would be expected based on taxa abundances in the two clusters. The surprising result that the high gout cluster could underproduce H2S despite having a higher abundance of H2S-synthesizing bacteria was rationalized by reduced L-cysteine production from Faecalibacterium in this cluster. Model predictions were not substantially altered by constraining uptake rates with different in silico diets, suggesting that sulfur-containing amino acid metabolism generally and H2S more specifically could be novel gout disease markers.

19
5-hydroxymethylcytosine sequencing in plasma cell-free DNA identifies unique epigenomic features in prostate cancer patients resistant to androgen deprivation therapy

Li, Q.; Huang, C.-C.; Huang, S.; Tian, Y.; Huang, J.; Bitaraf, A.; Dong, X.; Nevalanen, M. T.; Zhang, J.; Manley, B. J.; Park, J. Y.; Kohli, M.; Gore, E. M.; Kilari, D.; Wang, L.

2023-10-16 genetic and genomic medicine 10.1101/2023.10.13.23296758
#1
85× avg
Show abstract

BackgroundCurrently, no biomarkers are available to identify resistance to androgen-deprivation therapies (ADT) in men with hormone-naive prostate cancer. Since 5-hydroxymethylcytosines (5hmC) in gene body are associated with gene activation, in this study, we evaluated whether 5hmC signatures in cell-free DNA (cfDNA) predicts early resistance to ADT. ResultsWe collected a total of 139 serial plasma samples from 55 prostate cancer patients receiving ADT at three time points including baseline (prior to initiating ADT, N=55), 3-month (after initiating ADT, N=55), and disease progression (N=15) within 24 months or 24-month if no progression was detected (N=14). To quantify 5hmC abundance across the genome, we used selective chemical labeling sequencing and mapped sequence reads to individual genes. Differential methylation analysis in baseline samples identified significant 5hmC difference in 1,642 of 23,433 genes between patients with and without progression (false discovery rate, FDR<0.1). Patients with disease progression showed significant 5hmC enrichments in multiple hallmark gene sets with androgen responses as top enriched gene set (FDR=1.19E-13). Interestingly, this enrichment was driven by a subgroup of patients featuring a significant 5hmC hypermethylation in the gene sets involving AR, FOXA1 and GRHL2. To quantify overall activities of these gene sets, we developed a gene set activity scoring algorithm and observed significant association of high activity scores with poor progression-free survival (P<0.05). Longitudinal analysis showed that the high activity scores were significantly reduced after 3-months of initiating ADT (P<0.0001) but returned to higher levels when the disease was progressed (P<0.05). ConclusionsThis study demonstrates that 5hmC-based activity scores from gene sets involved in AR, FOXA1 and GRHL2 may be used as biomarkers to determine early treatment resistance, monitor disease progression, and potentially identify patients who would benefit from upfront treatment intensification.

20
A variational sparse Gaussian-process method for detecting spatially variable genes and cellular interactions from spatial transcriptomics

Wang, Z.; Xie, L.; Wang, Y.; Wang, Y.; Wang, T.; Shang, X.; Li, J.; Hu, J.

2025-12-11 genetic and genomic medicine 10.64898/2025.12.10.25341956
#1
85× avg
Show abstract

Advanced spatially resolved transcriptomic (SRT) technologies preserve the spatial context of gene expression within tissues, enabling the study of context-dependent transcriptional regulation. Here, we propose VISGP, a variational sparse gaussian-process method for spatial variable genes (SVGs) and cellular interactions analysis from such data. VISGP utilizes variational inference and a sparse Gaussian process approximation, which efficiently models the posterior distribution with a set of inducing variables, thereby minimizing computational and memory complexity. When applied to simulated data and four real data sets, VISGP successfully identified the most SVGs than existing methods, and detected 85 spatially constrained ligand-receptor pairs that are missed by other methods. Together, VISGP provides a powerful strategy for decoding spatial gene regulation and cellular interactions, offering valuable biological insights into cellular heterogeneity and cancer pathology. TeaserA statistical framework that uncovers spatially variable genes and cell-cell communication, revealing hidden biological architecture in tissues.